{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "93de5736",
   "metadata": {},
   "source": [
    "# Lesson 4: Sentence Embeddings"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9363203",
   "metadata": {},
   "source": [
    "- In the classroom, the libraries are already installed for you.\n",
    "- If you would like to run this code on your own machine, you can install the following:\n",
    "``` \n",
    "    !pip install sentence-transformers\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "632f7aac-6786-4ea6-8fa3-25a6cebbd2e5",
   "metadata": {},
   "source": [
    "- Here is some code that suppresses warning messages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "058015a6-19cf-4f80-940d-f4af86cb589c",
   "metadata": {
    "height": 46
   },
   "outputs": [],
   "source": [
    "from transformers.utils import logging\n",
    "logging.set_verbosity_error()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c35a8e72",
   "metadata": {},
   "source": [
    "### Build the `sentence embedding` pipeline using 🤗 Transformers Library"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2ed9cec8-803a-4d7e-99d9-c4c84682901c",
   "metadata": {
    "height": 29
   },
   "outputs": [],
   "source": [
    "from sentence_transformers import SentenceTransformer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dd5dbb50-8c36-456c-ac0e-f724429c4b7f",
   "metadata": {
    "height": 29
   },
   "outputs": [],
   "source": [
    "model = SentenceTransformer(\"all-MiniLM-L6-v2\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cad701ed",
   "metadata": {},
   "source": [
    "More info on [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fe7d23e0-5e68-4537-8dd8-eb125e1a6820",
   "metadata": {
    "height": 63
   },
   "outputs": [],
   "source": [
    "sentences1 = ['The cat sits outside',\n",
    "              'A man is playing guitar',\n",
    "              'The movies are awesome']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c33db645-edd8-4a28-a06f-e0fd8200f27f",
   "metadata": {
    "height": 29
   },
   "outputs": [],
   "source": [
    "embeddings1 = model.encode(sentences1, convert_to_tensor=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "75de3f4a-bd8e-41d6-847b-9a3a043adeeb",
   "metadata": {
    "height": 29
   },
   "outputs": [],
   "source": [
    "embeddings1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5136886d-80d4-4a3a-a68e-692c25496b51",
   "metadata": {
    "height": 63
   },
   "outputs": [],
   "source": [
    "sentences2 = ['The dog plays in the garden',\n",
    "              'A woman watches TV',\n",
    "              'The new movie is so great']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c7e0c68f",
   "metadata": {
    "height": 46
   },
   "outputs": [],
   "source": [
    "embeddings2 = model.encode(sentences2, \n",
    "                           convert_to_tensor=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8a213124-ea97-4706-bbf4-737490e94244",
   "metadata": {
    "height": 29
   },
   "outputs": [],
   "source": [
    "print(embeddings2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41c3d585",
   "metadata": {},
   "source": [
    "* Calculate the cosine similarity between two sentences as a measure of how similar they are to each other."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9e3b38a5-9b35-49de-9f85-c62583d6287d",
   "metadata": {
    "height": 29
   },
   "outputs": [],
   "source": [
    "from sentence_transformers import util"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "39c1f4f3-94ad-4b5e-a40d-c4ba8277815b",
   "metadata": {
    "height": 29
   },
   "outputs": [],
   "source": [
    "cosine_scores = util.cos_sim(embeddings1,embeddings2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b6859d46-15a7-4f61-8a9f-06a15baeff40",
   "metadata": {
    "height": 29
   },
   "outputs": [],
   "source": [
    "print(cosine_scores)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fae8571e-2dea-4872-b244-342731b949de",
   "metadata": {
    "height": 97
   },
   "outputs": [],
   "source": [
    "for i in range(len(sentences1)):\n",
    "    print(\"{} \\t\\t {} \\t\\t Score: {:.4f}\".format(sentences1[i],\n",
    "                                                 sentences2[i],\n",
    "                                                 cosine_scores[i][i]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "49863f4c",
   "metadata": {},
   "source": [
    "### Try it yourself! \n",
    "- Try this model with your own sentences!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
